How to adjust an ensemble size in stream data mining?

نویسندگان

  • Lena Pietruczuk
  • Leszek Rutkowski
  • Maciej Jaworski
  • Piotr Duda
چکیده

In this paper we propose a new approach for designing an ensemble applied to stream data classification. Our approach is supported by two theorems showing how to decide whether a new component should be added to the ensemble or not, based on the assumption that such an action should increase the accuracy of the ensemble not only for the current portion of observations but also for the whole (infinite) data stream. The conclusions of these theorems hold with a certain probability (confidence) set by the user. Through computer simulations, among others, we show that decreasing the confidence that decision based on the finite portion of the stream is the same as based on the whole (infinite) data stream only slightly improves the accuracy at the expense of significant memory consumption. Moreover, we will introduce a novel procedure of weighting ensemble components, i.e. decision trees, by assigning a weight to each leaf of the tree. In previous approaches a weight was assigned to the whole ensemble component. The new approach is based on the observation that probability of the correct tree outcome is different in various tree sections.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Mining Concept-Drifting Data Stream to Detect Peer to Peer Botnet Traffic

We propose a novel stream data classification technique to detect Peer to Peer botnet. Botnet traffic can be considered as stream data having two important properties: infinite length and drifting concept. Thus, stream data classification technique is more appealing to botnet detection than simple classification technique. However, no other botnet detection approaches so far have applied stream...

متن کامل

New Ensemble Method for Classification of Data Streams

Classification of data streams has become an important area of data mining, as the number of applications facing these challenges increases. In this paper, we propose a new ensemble learning method for data stream classification in presence of concept drift. Our method is capable of detecting changes and adapting to new concepts which appears in the stream. Data stream classification; concept d...

متن کامل

Handling Gradual Concept Drift in Stream Data

Data streams are sequence of data examples that continuously arrive at time-varying and possibly unbound streams. These data streams are potentially huge in size and thus it is impossible to process many data mining techniques (e.g., sensor readings, call records, web page visits). Tachiniques for classification fail to successfully process data streams because of two factors: their overwhelmin...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 381  شماره 

صفحات  -

تاریخ انتشار 2017